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SCANNING  AND  MONITORING  PERFORMANCE; 
EFFECTS  OF  THE  REINFORCEMENT  VALUES 
OF  THE  EVENTS  BEING  MONITORED 


INTRODUCTION 

Scanning  and  monitoring  errors  may  increase  if  Air 
Traffic  Control  Specialists  (ATCSs)  focus  or  “lock” 
their  attention  onto  a  limited  area  of  their  control 
display  to  the  exclusion  of  other  relevant  parts  of  the 
display.  Such  problems  are  seldom  discussed  in  stan¬ 
dard  aviation  references  (1),  but  have  received  atten¬ 
tion  in  the  visual  process  and  control  literature  (2). 
Because  such  errors  can  seriously  compromise  air  safety, 
we  initiated  research  to  identify  factors  that  could 
affect  the  occurrence  of  “locking”  behavior  and  the 
concomitant  error  rate.  The  reviewed  literature  yielded 
no  studies  that  appeared  immediately  relevant  to  the 
effects  of  target  values,  error  costs,  and  rewards  or 
penalties  (reinforcements)  on  monitoring  performance. 
Therefore,  we  decided  to  investigate  whether  target 
value  or  error  costs  are  demonstrable  factors  in  induc¬ 
ing  locking.  Such  information  may  be  useful  in  reduc¬ 
ing  the  frequency  of  scanning  errors  by  revising  training 
protocols  or  personnel  selection  criteria. 

Given  a  test  where  two  work  areas  had  similar  task 
difficulty,  but  sharply  different  penalties  for  an  error, 
we  hypothesized  that  a  reward  for  good  performance 
would  tend  to  cause  locking  on  the  task  with  the 
highest  penalties. 

METHODS 

Equipment 

A  locally  developed  character  recognition  and  scan¬ 
ning  performance  test  system  was  used  to  generate  the 
display  as  well  as  record  and  categorize  subjects’  (Ss) 
responses. 

All  programs  were  written  in  Borland’s  Turbo 
Pascal™  programming  language  and  run  on  a  standard 
IBM  PC-AT™  micro-computer  with  an  8  MHz.  clock, 
a  standard  EGA  adapter  and  a  1 3"  diagonal  640  by  350 
pixel  color  monitor.  A  previous  study  (3)  showed  that 
the  test  produced  results  that  were  congruous  with 


those  found  in  other,  more  complex,  tests  of  character 
recognition  and  scanning  performance. 

Task  Description 

The  test  required  the  Ss  to  visually  monitor  two  1 00 
by  1 00  pixel  “work  areas”  horizontally  aligned  either  3 
or  1 2  deg.  of  arc  apart,  inner-edge  to  inner-edge  along 
the  midline  of  the  computer  display,  as  shown  in 
Figure  1. 

Each  work  area  was  filled  with  a  changing  random 
dor  pattern,  each  dot  being  one  pixel.  As  the  dots  were 
replaced,  they  slowly  overwrote  the  whole  work  area. 
The  dot  replacement  rate  was  750  pixels  per  second  per 
work  area.  At  random  intervals,  the  characters  S,  B,  0, 
3,  5  and  8,  which  share  some  similar  shape  character¬ 
istics  (4),  were  wrirten  somewhere  in  each  work  area 
within  a  7  by  7  pixel  array.  The  Ss  were  to  indicate  with 
simple  keyboard  inputs  when,  and  in  which  work  area, 
the  target  character  “5”  appeared  before  the  next  char¬ 
acter  was  written  to  the  same  work  area.  When  Ss  made 
either  type  of  error  the  system  provided  feedback  to  Ss 
by  sounding  a  short  beep  or  tone  through  the  system 
speaker. 

In  Figure  1 ,  the  “3”  near  the  center  of  the  left  work 
area  was  written  0.5  sec.  before  the  frame  was  captured. 
L  is  still  clearly  legible.  The  top  center  of  the  same 
work  area  shows  the  remains  of  another  “3”,  and  the 
bottom  right  corner  of  the  right  work  area  shows  the 
remains  of  a  “  5,”  both  written  2. 5  sec.  before  the  screen 
was  captured.  Both  of  these  characters  are  on  the  verge 
of  becoming  illegible. 

At  the  intended  viewing  distance  of  60  cm.,  each  pixel 
subtended  2  min.  of  arc,  the  character  15  min.  of  arc,  and 
each  work  area  3  deg.  20  min.  of  arc.  Observed  viewing 
distances  varied  between  40  and  70  cm.  Preliminary 
attempts  to  fix  the  S’s  head  position  to  provide  a  relatively 
constant  viewing  distance  proved  impractical,  consider¬ 
ing  the  length  of  the  80-minute  test  period. 
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Figure  1 

Example  of  screen  display  for  3°  separation  (shown  in  reverse  contrast 
for  clarity). 
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Table  1,  Experimental  design  and  protocol. 
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Procedure 

In  this  system,  task  difllculty,  or  “workload,”  is  a 
function  of  the  separation  between  work  areas,  and 
symbol  presentation  rates  within  each  work  area.  Work 
area  spacings  and  symbol  presentation  rates  were  var¬ 
ied  according  to  a  simple  latin  square  design,  as  shown 
in  Table  1.  This  enabled  us  to  test  for,  and  evaluate, 
learning  or  fatigue  effects  and  any  serial  order  interac¬ 
tions,  none  of  which  became  evident. 

Since  the  symbol  presentations  were  controlled  by  a 
random  number  generator,  the  total  number  of  sym¬ 
bols  (including  targets)  and  the  number  of  target 
symbols  presented  in  each  test  segment  were  not  con¬ 
stant  for  all  work  areas  and  segments  nor  for  all  Ss. 
Based  on  the  data  for  one  experiment,  the  combined 
number  of  symbol  presentations  per  segment  for  the 
two  work  areas  ranged  from  1005  to  1007,  with 
individual  work  areas  ranging  from  500  to  518  sym¬ 
bols  each.  The  number  of  target  presentations  was 
somewhat  more  variable,  ranging  from  70  to  134  per 
work  area,  or  239  to  254  per  segment. 

Therefore,  to  equalize  data  structure  for  all  Ss,  all 
detection  and  recognition  error  data  were  expressed  as 
a  percentage  of  the  number  of  symbols  presented  in 
each  test  segment.  We  were  primarily  interested  in  the 
comparison  of  detection  and  recognition  error  fre¬ 
quencies  for  each  S.  Since  the  percentage  data  distribu¬ 
tions  were  non-normal,  all  statistical  analyses  were  run 
using  the  non-parametric  Wilcoxon  Matched  Pairs 
Test,  as  implemented  with  the  NCSS^*^  statistical 
program  set. 

Detection  and  recognition  error  classification  was 
critical  to  these  experiments.  If  a  S  locked  onto  any  one 
work  area,  there  should  be  a  relative  increase  in  detec¬ 
tion  errors  (misses)  in  the  other  work  area,  since 
symbols  occurring  there  would  not  be  seen.  However, 
recognition  errors  in  both  work  areas  would  be  similar 
since,  once  a  symbol  is  seen,  there  should  be  no 
difference  in  the  symbol  recognition  and  response 
selection  processes  in  the  two  work  areas. 

Errors  were  classified  as  detection  errors  v/htn  there 
was  no  response  between  a  target  symbol  presentation 
and  the  display  of  the  next  symbol  in  any  one  work 
area.  We  assumed  that  the  Ss  had  not  detected  that  a 
target  symbol  was  present.  Errors  were  classified  as 


recognition  errors  whenever  a  response  was  made  to  a 
non-target  character.  In  this  case,  we  assumed  that  the 
S  had  seen  a  symbol  but  had  not  accurately  recognized 
it,  or  had  made  an  error  in  recognition-response  cou¬ 
pling.  At  the  end  of  the  test  session,  the  program 
calculated  a  score,  which  was  displayed  for  each  S’s 
information  and  for  reward  calculations. 

Subjects 

Thirty-two  paid  volunteer  Ss  were  used  in  each 
experiment.  The  purpose  and  nature  of  the  experiment 
was  explained  to  them  and  they  were  advised  that  they 
could  withdraw  from  participation  at  any  time.  They 
were  tested  to  ensure  that  their  correctable  visual 
acuity  was  20/25  or  better.  Their  right/left  eye  domi¬ 
nance  was  also  determined  to  be  able  to  test  for  and 
evaluate  any  potential  positional  effects.  Their  ages 
ranged  from  18  to  30  years;  there  were  67  men  and  61 
women. 

Design  of  Experiments 

Four  sets  of  experiments  were  executed  using  the 
described  test  system; 

Experiment  1.  This  tested  the  hypothesis  that  over¬ 
all  performance  reinforcement  would  have  little  effect. 

The  Ss  were  rewarded  if  their  percent  of  correct  re¬ 
sponses  to  the  total  number  of  target  presentations  at 
the  end  of  the  80-minute  test  period  was  above  the 
group  median,  based  on  prior  unrewarded  test  runs 
during  an  earlier  study  (3).  We  set  identical  task 
difficulties  and  error  penalties  of  one  point  per  error  in 
the  work  areas.  We  used  two  groups  of  16  Ss,  one 
group  composed  of  Ss  having  had  some  previous  expo¬ 
sure  to  an  identical  display,  the  other  of  Ss  who  had  no 
such  prior  experience. 

Experiment  2.  The  second  experiment  tested  the 
hypothesis  that,  to  minimize  penalties,  the  Ss  would 
tend  to  lock  onto  the  work  area  having  the  greatest  task  Q 
difficulty.  It  was  identical  to  the  first  experiment,  G 

except  that  the  work  area  task  difficulties  differed;  the  - — 

high-load  work  area  had  symbol  and  dot  replacement  “ 

rates  twice  that  of  the  low-load  work  area.  Though  the  - - 

high-load  and  low-load  work  areas  were  not  specifi- - - 

cally  identified,  preliminary  tests  demonstrated  that 
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the  Ss  determined  (within  seconds)  which  was  which 
by  the  relative  speed  of  the  dot  and  character  replace¬ 
ment  rates. 

Experiment  3.  This  tested  our  primary  hypothesis 
that  Ss  would  tend  to  lock  onto  the  work  area  where 
errors  were  penalized  the  most  in  order  to  maximize 
their  score  (performance)  and  thus  improve  their 
chances  of  earning  a  reward  (performance  bonus).  The 
protocol  was  identical  to  that  in  Experiment  1  above, 
except  that; 

(i)  The  performance  bonus  was  split  so  that  Ss 
scoring  in  the  top  quartile  of  a  representative 
range  of  scores  obtained  from  previous  experi¬ 
mental  sessions  got  a  bonus  equal  to  50%  of 
their  guaranteed  minimum  earnings,  while 
those  in  the  next  quartile  received  25%; 

(ii)  Four  points  per  error  were  deducted  from  the 
S’s  score  in  the  clearly  marked  high-value  work 
area,  while  one  point  per  error  was  deducted  in 
the  low-value  work  area. 

Experiment  4.  This  was  identical  to  Experiment  3 
except  that  10  points  per  error  were  deducted  in  the 
high-value  work  area  and  1  point  per  error  in  the  low- 
value  work  area. 

Upon  completion  of  the  vision  testing,  the  S  was 
seated  at  the  display  station  and  the  general  room 


lighting  was  dimmed.  Pre-test  instructions,  including 
the  possibility  of  a  bonus  for  good  performance,  were 
read  and  explained  to  the  S.  The  Ss  were  advised  to  pay 
constant  attention  to  the  display  and  ignore  any  dis¬ 
tractions  to  improve  their  chance  for  earning  a  bonus. 
Based  on  the  results  from  the  earlier  study  (3),  which 
showed  the  task  to  be  unaffected  by  practice  or  previ¬ 
ous  experience,  Ss  were  not  given  any  task  training  or 
practice  prior  to  the  experimental  task. 

RESULTS  AND  DISCUSSION 
General  Findings 

There  was  no  significant  correlation  at  the  p  =  0.10 
level  of  confidence  between  any  performance  variable  and 
S’s  age,  sex,  corrected  visual  acuity,  or  right/left  eye 
dominance.  There  was  also  no  significant  correlation 
between  an  S’s  detection  and  recognition  error  perfor¬ 
mance.  That  is,  for  any  S,  a  high  or  low  error  rate  in  the 
one  did  not  necessarily  mean  a  similar  rate  in  the  other. 
A  summary  of  our  findings  is  presented  in  Table  2. 

Experiment  1 

This  experiment  confirmed  the  test  hypothesis.  The 
reward  (reinforcement)  did  not  significantly  affect 
performance  when  compared  to  results  from  our  pre¬ 
vious  studies,  where  good  performance  was  not  re¬ 
warded.  This  conforms  with  a  number  of  other  studies 


Experiment 

Avg.  Total 
%  D/E  • 
Rounded 
values 

Avg.  Total 
%  R/E  • 
Rounded 
values 

Low  D/E  • 
Versus 

High  D/E 
Significant 

@  p<0.10? 

Low  R/E  * 

Versus 

High  R/E 
Significant 

@  p<0.10? 

1 

9±  3 

9±  2 

No 

No 

2 

19±  5 

9±  3 

No 

No 

3 

22  ±  6 

10±  3 

Yes 

No 

4 

21  ±  6 

10±  4 

Yes 

No 

Table  2.  Summary  of  results. 

*  D/E  =  Detection  Errors. 

R/E  =  Recognition  Errors. 

Low/High  D/E  =  Low/High  Workload/Value  Detection  Error 
Low/High  R/E  =  Low/High  Workload/Value  Recognition  Errors. 
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also  indicating  that  human  task  performance  may  not 
be  directly  affected  by  a  delayed  reinforcement  para- 
digm.  In  effect,  the  immediate  problems  of  task  perfor¬ 
mance  override  any  awareness  of  rewards  to  be  earned 
“later.” 

There  were  also  no  significant  differences  in  error 
rates  between  the  two  work  areas,  nor  were  there 
significant  differences  in  performance  between  the  16 
experienced  and  16  inexperienced  Ss. 

Experiment  2  ^ 

The  results  contradicted  the  test  hypothesis.  Differing 
workloads  had  no  effect  on  locking.  There  was  no  signifi¬ 
cant  difference  in  either  detection  or  recognition  errors 
between  the  high-  and  low-workload  work  areas. 


The  results  suggest,  as  in  the  first  experiment,  the 
relative  ineffectiveness  of  a  delayed  reward  paradigm 
for  this  protocol.  That  is,  the  task  was  difficult  enough, 
or  interesting  enough  to  fully  occupy  the  S’s  attention 
with  the  task,  and  they  were  not  really  aware  of  the 
connections  between  their  actions  and  the  promise  of 
a  reward  at  the  end  of  the  test.  It  is  also  possible  that  the 
Ss  did  not  perceive  the  workload  differential,  a  factor 
of  about  two,  as  significant  enough  to  elicit  the  antici¬ 
pated  differential  attention. 

Experiment  3 

The  results  partially  confirmed  the  hypothesis  that 
the  Ss  would  tend  to  lock  onto  the  high  error  value 
work  area.  The  percentage  of  detection  errors  was 


The  Y  axis  was  formed  from  the  difference  between  an  S's  low-  and  high-value  work 
area  recognition  error  percentages.  The  X  axis  was  similarly  calculated,  except  it 
applies  to  detection  error  rates.  Positive  values  mean  that  low-value  work  area  errors 
were  greater  than  those  for  the  high-value  work  area. 
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statistically  higher  in  the  low-value  work  area  (p  < 
0.004) .  The  absence  of  significant  differences  in  recog¬ 
nition  errors  between  the  two  work  areas  remains  to  be 
explained.  This  was  true  for  most  Ss,  but  not  all.  This 
difference  in  individual  scanning  strategies  deserves 
close  attention  in  future  research,  as  it  may  reflect  basic 
differences  in  scanning  ability. 

In  addition,  the  percentage  of  detection  errors  was 
roughly  double  the  number  of  recognition  errors  in 
both  work  areas.  This  difference  was  also  significant  (p 
<  0.001),  suggesting  that  the  workload  was  sufficiently 
high  to  require  the  S’s  full  attention  for  performance. 
That  is,  the  time  required  to  detect,  recognize  and 
respond  to  a  target  was  long  enough  that  some  targets 
were  not  detected  in  the  time  window  available,  irre¬ 
spective  of  work  area.  Therefore,  boredom,  or  overall 
inattention,  probably  were  not  major  factors  in  pro¬ 
ducing  the  performance  differences  seen. 


We  also  studied  the  distribution  of  detection  and 
recognition  errors  for  each  S.  In  Figure  2,  we  have 
plotted,  on  an  XY  graph,  the  differences  between 
low-  and  high-value  work  area  detection  and  recog¬ 
nition  error  percentages.  The  range  of  error  differ¬ 
ences,  for  most  Ss  and  for  both  error  types,  is  about 
±  4.  Nine  Ss  show  a  markedly  high  percentage  of 
low-value  work  area  detection  errors.  These  differ¬ 
ences  among  Ss  may  reflect  differing  score  maxi¬ 
mizing  strategies,  which  are  influenced  and 
controlled  by  learnir\g.  However,  the  data  may 
indicate  that  the  increased  tendency  toward  locking 
behavior  reflected  significant  individual  ability  dif¬ 
ferences.  If  confirmed,  this  could  provide  a  practi¬ 
cal  methodology  for  personnel  selection  in  the  air 
traffic  control  system. 


Figure  3 

Results  for  Experiment  4  arranged  as  in  Figure  2,  but  with  modified  axis  scales. 
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Experiment  4 

The  results  are  graphed  in  Figure  3.  The  data 
confounded  both  predictions.  Again,  most  Ss  clus¬ 
tered  in  the  ±  4  error  range  and  there  was  a  significant 
difference  between  high-  and  low-  value  work  area 
detection  error  rates,  but  at  a  lower  significance  level  (p 
<  0.02)  than  in  Experiment  3.  This  was  due  to  a 
decrease  in  the  overall  frequency  of  detection  errors  in 
the  low-value  work  area  (p  <  0.01).  Four  Ss  in  experi¬ 
ment  4  showed  a  markedly  high  percentage  of  low- 
value  detection  errors,  as  against  9  in  experiment  3. 
There  were  also  no  significant  differences  between 
Experiment  3  and  Experiment  4  session  average  recog¬ 
nition  or  detection  error  rates,  except  for  the  shift 
caused  by  the  fewer  Ss  having  high  detection  error  rates 
in  the  low-value  work  area. 

The  reason  for  the  different  results  in  Experiments 
3  and  4  is  more  difficult  to  explain.  Perhaps  the  Ss  did 
not  see  much  difference  between  a  4-point  and  10- 
point  error  penalty.  Even  so,  the  reduction  in  the 
number  of  Ss  with  locking  is  still  puzzling.  One  pos¬ 
sible  explanation  is  that  the  “motivation”  of  the  Ss 
recruited  for  this  test  series  differed  from  those  re¬ 
cruited  for  Experiment  3.  However,  their  overall  scores 
and  recognition  error  incidence  were  similar  to  the 
Experiment  3  data.  Thus,  their  motivation  and  atten¬ 
tion  did  not  seem  to  differ  from  the  other  Ss.  It  may  be 
that  these  results  reflect  slightly  different  scanning  or 
working  strategies  adopted  by  the  Ss  in  Experiments  3 
and  4.  For  such  relatively  small  groups,  some  group  to 
group  variability  ought  to  be  expected,  even  if  each 
sample  group’s  results  seem  normally  distributed,  as 
these  were. 

Since  9  Ss  in  Experiment  3,  and  ^  Ss  in  Experiment 
4,  did  show  an  unusual  tendency  toward  locking,  there 
are  clearly  differences  in  scanning  ability,  or  perhaps 
motivation,  among  our  Ss.  Present  data  do  not  permit 
an  exact  estimate  of  their  prevalence.  After  allowances 
for  possible  population  distribution  variability,  we 
estimate  from  our  results  that  about  1 5%  of  the  general 
population  may  exhibit  this  locking  tendency.  Thus, 
testing  for  scanning  ability  and  any  tendency  towards 
locking  could  be  a  valuable  addition  to  current  person¬ 
nel  test  and  evaluation  procedures. 

There  is  also  a  general  cautionary  note  in  these 
results.  Most  studies  of  human  scanning  or  monitor¬ 


ing  performance  use  relatively  few  Ss.  Indeed,  many  of 
the  papers  reviewed  (e.g.,  5,  6,  7)  used  fewer  than  5  Ss. 
We  do  not  believe  that  the  results  of  such  studies 
should  have  the  general  applicability  claimed  of  them 
without  extensive  replication  with  adequate  numbers 
of  subjects. 

CONCLUSIONS 

About  1 5%  of  our  Ss  showed  a  marked  tendency  to 
concentrate  on  a  display  sub-area  containing  very  high 
value  events,  while  periodically  ignoring  events  else¬ 
where  on  the  display.  This  suggests  that  there  may  be 
significant  individual  differences  in  ability  or  strate¬ 
gies  to  effectively  scan/monitor  complex  screen  dis¬ 
plays  over  long  time  periods.  Tests  for  such  differences 
could  be  useful  for  personnel  selection  and  retention 
purposes  in  the  Air  Traffic  Control  System. 
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